Concept Frequency: A Feature Set Based Text Compression Model
نویسنده
چکیده
A summary is a shorter version of the original. Such a simplification highlights the major points from the much longer subject, such as a text, speech, film, or event. The purpose is to help the audience get the gist in a short period of time. Automatic summarization involves reducing a text document or a larger corpus of multiple documents into a short set of words or paragraph that conveys the main meaning of the text. In this paper, we propose a new multi-document summarization approach that makes use of a feature called ‘concept frequency’ that verifies sentences nearness with a cluster opted. Hence the summarization relevance would be more effective. A novel feature called exemplar is to help simultaneously deals with sentence ranking. A fuzzy medoid-based clustering approach is used to produce sentence clusters or subsets where each of them corresponds to a subtopic of the related topic. Keywords— Concept frequency, exemplar, text Summarization
منابع مشابه
Feature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context
Nowadays, users can share their ideas and opinions with widespread access to the Internet and especially social networks. On the other hand, the analysis of people's feelings and ideas can play a significant role in the decision making of organizations and producers. Hence, sentiment analysis or opinion mining is an important field in natural language processing. One of the most common ways to ...
متن کاملFeature-Based and Feature-Free Textual CBR: A Comparison in Spam Filtering
Spam filtering is a text classification task to which CaseBased Reasoning (CBR) has been successfuly applied. We describe the ECUE system, which classifies emails using a feature-based form of textual CBR. Then, we describe an alternative way to compute the distances between cases in a feature-free fashion, using a distance measure based on text compression. This distance measure has the advant...
متن کاملA General Investigation on the Combination of Local and Global Feature Selection Methods for Request Identification in Telegram
Nowadays, the use of various messaging services is expanding worldwide with the rapid development of Internet technologies. Telegram is a cloud-based open-source text messaging service. According to the US Securities and Exchange Commission and based on the statistics given for October 2019 to present, 300 million people worldwide used telegram per month. Telegram users are more concentrated in...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملCatching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering
In this paper, we compare case-based spam filters, focusing on their resilience to concept drift. In particular, we evaluate how to track concept drift using a case-based spam filter that uses a featurefree distance measure based on text compression. In our experiments, we compare two ways to normalise such a distance measure, finding that the one proposed in [1] performs better. We show that a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012